A max-margin model for efficient simultaneous alignment and folding of RNA sequences

نویسندگان

  • Chuong B. Do
  • Chuan-Sheng Foo
  • Serafim Batzoglou
چکیده

MOTIVATION The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. RESULTS In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. AVAILABILITY Source code for RAF is available at:http://contra.stanford.edu/contrafold/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relation Between RNA Sequences, Structures, and Shapes via Variation Networks

Background: RNA plays key role in many aspects of biological processes and its tertiary structure is critical for its biological function. RNA secondary structure represents various significant portions of RNA tertiary structure. Since the biological function of RNA is concluded indirectly from its primary structure, it would be important to analyze the relations between the RNA sequences and t...

متن کامل

Discriminative Structured Models for Biological Sequence Analysis a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Making predictions is a key element in many computational biology applications: given a set of input biological sequences, use an inference procedure to generate some corresponding predicted output. The prediction process involves defining an appropriate scoring model for comparing alternative output predictions, developing efficient inference algorithms for choosing high-scoring outputs, and c...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Molecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds

The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...

متن کامل

A comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences

More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2008